2 Hidden Layers

Programming a Neural Network with 2 hidden layers is essentially the same process as with a single hidden layer with a single extra step. Because there is an extra layer, there is a third weight array and an extra level of gradient descent required. Thus there is a bit more math, and a few extra lines of code.



In [17]:

    
import NeuralNetImport as NN
import numpy as np
from sklearn.datasets import load_digits 
digits = load_digits()
import NNpix as npx
from IPython.display import HTML
from IPython.display import display

Neuron with 2 Hidden Layers



In [18]:

    
npx.cneuron2









    Out[18]:

Gradient Descent with 2 Hidden Layers



In [19]:

    
npx.derivation2









    Out[19]:



In [20]:

    
f = open("HTML2.html")



In [21]:

    
display(HTML(f.read()))









    





     
        
         Diagram Equations 
         Partial Derivatives
        
    
    
          $\hat{y}$ 
         $\hat{y}=\sigma(N)$ 
         $\frac{\partial\hat{y}}{\partial N} = \sigma\prime(N) $
        
    
    
         N 
        $ N = \sigma(N) \times w3 $
         $ \frac{\partial N}{\partial M} = \sigma\prime(M) \times w_3 $ 
         $ \frac{\partial N}{\partial w_3} = \sigma(M) $
    
    
          M 
        $ M = \sigma(L) \times w2 $ 
         $ \frac{\partial M}{\partial L} = \sigma\prime(L) \times w_2 $
         $ \frac{\partial M}{\partial w_2} = \sigma(L) $
    
    
          L 
        $ L = K \times w1 $ 
         $ \frac{\partial L}{\partial w_1} = K $
        
    

    




    
        
        Gradients with Chain Rule
        Gradients with Substitution
    
    
         $w_3$ 
         $ \frac{\partial{C}}{\partial{w_3}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{N}} \frac{\partial{N}}{\partial{w_3}}$
        $ \frac{\partial{C}}{\partial{w_3}} = -(y - \hat{y}) \times \sigma \prime(N) \times \sigma(M) $ 
    
    
         $w_2$ 
         $ \frac{\partial{C}}{\partial{w_2}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{N}} \frac{\partial{N}}{\partial{M}} \frac{\partial{M}}{\partial{w_2}}$ 
         $ \frac{\partial{C}}{\partial{w_2}} = -(y-\hat{y}) \times \sigma\prime(N) \times \sigma\prime(M) \times w_3 \times \sigma(L) $ 
    
    
         $w_3$ 
         $\frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{N}} \frac{\partial{N}}{\partial{M}} \frac{\partial{M}}{\partial{L}} \frac{\partial{L}}{\partial{w_1}}$ 
         $ \frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \times \sigma\prime(N) \times \sigma\prime(M) \times w_3 \times \sigma\prime(L) \times w_2 \times K $



In [22]:

    
f.close()

Create Training Inputs and Solutions

Use 1000 random samples to generate an input and solution. The other 797 will be used to test.



In [23]:

    
perm = np.random.permutation(1792)
training_input = np.array([digits.images[perm[i]].flatten() for i in range(1000)])/100



In [24]:

    
training_solution = NN.create_training_soln(digits.target[perm], 10)



In [25]:

    
train = NN.NN_training_2(training_input, training_solution, 64, 10, 60, 50, 80, 0.7)

Getting Weights

To find weights, use the commented out line below.



In [33]:

    
# x,y,z = train.train()



In [34]:

    
f = np.load("2HiddenWeights.npz")



In [35]:

    
x = f['arr_0']
y = f['arr_1']
z = f['arr_2']



In [36]:

    
assert len(x) == 60
assert len(y) == 50
assert len(z) == 10

Find Solutions



In [37]:

    
ask = [NN.NN_ask_2(np.array([digits.images[perm[i]].flatten()])/100,x,y,z) for i in range(1000,1792)]



In [38]:

    
comp_vals = [ask[i].get_ans() for i in range(len(ask))]

Calculate Accuracy



In [39]:

    
print((sum(((comp_vals - np.array([digits.target[perm[i]] for i in range(1000,1792)]) == 0).astype(int)) / 792 * 100)), "%")









    



97.7272727273 %



In [ ]:

	Diagram Equations	Partial Derivatives
$\hat{y}$	$\hat{y}=\sigma(N)$	$\frac{\partial\hat{y}}{\partial N} = \sigma\prime(N) $
N	$ N = \sigma(N) \times w3 $	$ \frac{\partial N}{\partial M} = \sigma\prime(M) \times w_3 $	$ \frac{\partial N}{\partial w_3} = \sigma(M) $
M	$ M = \sigma(L) \times w2 $	$ \frac{\partial M}{\partial L} = \sigma\prime(L) \times w_2 $	$ \frac{\partial M}{\partial w_2} = \sigma(L) $
L	$ L = K \times w1 $	$ \frac{\partial L}{\partial w_1} = K $

	Gradients with Chain Rule	Gradients with Substitution
$w_3$	$ \frac{\partial{C}}{\partial{w_3}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{N}} \frac{\partial{N}}{\partial{w_3}}$	$ \frac{\partial{C}}{\partial{w_3}} = -(y - \hat{y}) \times \sigma \prime(N) \times \sigma(M) $
$w_2$	$ \frac{\partial{C}}{\partial{w_2}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{N}} \frac{\partial{N}}{\partial{M}} \frac{\partial{M}}{\partial{w_2}}$	$ \frac{\partial{C}}{\partial{w_2}} = -(y-\hat{y}) \times \sigma\prime(N) \times \sigma\prime(M) \times w_3 \times \sigma(L) $
$w_3$	$\frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{N}} \frac{\partial{N}}{\partial{M}} \frac{\partial{M}}{\partial{L}} \frac{\partial{L}}{\partial{w_1}}$	$ \frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \times \sigma\prime(N) \times \sigma\prime(M) \times w_3 \times \sigma\prime(L) \times w_2 \times K $